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Abstract 

In this paper, we introduce a new adaptive data analysis method to study trend 
and instantaneous frequency of nonlinear and non-stationary data. This method is 
inspired by the Empirical Mode Decomposition method (EMD) and the recently devel- 
oped compressed (compressive) sensing theory. The main idea is to look for the sparsest 
representation of multiscale data within the largest possible dictionary consisting of in- 
trinsic mode functions of the form {a(t) cos(0(i))}, where a £ V(8), V(6) consists of 
the functions smoother than cos(0(t)) and 6' > 0. This problem can be formulated as 
a nonlinear L° optimization problem. In order to solve this optimization problem, we 
propose a nonlinear matching pursuit method by generalizing the classical matching 
pursuit for the L optimization problem. One important advantage of this nonlinear 
matching pursuit method is it can be implemented very efficiently and is very stable 
to noise. Further, we provide a convergence analysis of our nonlinear matching pursuit 
method under certain scale separation assumptions. Extensive numerical examples will 
be given to demonstrate the robustness of our method and comparison will be made 
with the EMD/EEMD method. We also apply our method to study data without scale 
separation, data with intra-wave frequency modulation, and data with incomplete or 
under-sampled data. 

1 Introduction 

Developing a truly adaptive data analysis method is important for our understanding of 
many natural phenomena. Traditional data analysis methods, such as the Fourier transform, 
use pre-determined basis. They provide an effective tool to process linear and stationary 
data. However, there are still some limitations in applying these methods to analyze non- 
linear and nonstationary data. Time-frequency analysis has been developed to overcome 
the limitations of the traditional techniques by representing a signal with a joint function 
of both time and frequency. The recent advances of wavelet analysis have opened a new 
path for time- frequency analysis. A significant breakthrough of wavelet analysis is the use 
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of multi-scales to characterize signals. This technique has led to the development of several 
wavelet-based time- frequency analysis techniques [151 El EH] . 

Another important approach in the time-frequency analysis is to study instantaneous 
frequency of a signal. Some of the pioneering work in this area was due to Van der Pol [28] 
and Gabor [11], who introduced the so-called Analytic Signal (AS) method that uses the 
Hilbert transform to determine instantaneous frequency of a signal. This Analytic Signal 
method is one of the most popular ways to define instantaneous frequency. Until very 
recently, this method works mostly for monocomponent signals in which the number of 
zero-crossings is equal to the number of local extreme pQ. There were other attempts to 
define instantaneous frequency such as the zero-crossing method |25| [261 120] and the Wigner- 
Ville distribution method [H [T71 [231 El EH E2] • However most of these methods are rather 
restrictive. More substantial progress has been made only recently with the introduction 
of the EMD method [14]. The EMD method provides an effective tool to decompose a 
signal into a collection of intrinsic mode functions (IMF) that allow well-behaved Hilbert 
transforms for computation of physically meaningful time-frequency representation. We 
remark that the Hilbert spectral representation based on the wavelet projection has also 
been carried in |21j . 

Inspired by the EMD method and the recently developed compressed (compressive) 
sensing theory, we propose a data-driven time-frequency analysis method. There are two 
important ingredients of this method. The first one is that the basis that is used to de- 
compose the data is derived from the data rather than determined a priori. This explains 
the name "data-driven" in our method. The second ingredient is to look for the sparsest 
decomposition of the signal among the largest possible dictionary consisting of intrinsic 
mode functions. The adoption of this data-driven basis and the search for the sparsest 
decomposition over this highly redundant basis make our time-frequency analysis method 
fully adaptive to the signal. As we are going to demonstrate later, our method can reveal 
some hidden physical information of the signal, such as trend and instantaneous frequency. 

Our data-driven time-frequency analysis method is motivated by the observation that 
many multi-scale data often have an intrinsic sparse structure in the time- frequency plane, 
although its representation in the physical domain could be quite complicated. The chal- 
lenge is that such sparse representation is valid only for certain multiscale basis that is 
adapted to the data and is unknown a priori. Finding such nonlinear multiscale basis is 
an essential ingredient of our method. In this sense, our problem is more difficult than 
the compressed (compressive) sensing problem in which the basis is assumed to be known 
a priori. One way to find the adaptive basis is to learn from the data if we have a large 
number of data samples that share the similar physical property. This does not apply to 
our problem since we deal with only a single signal. We overcome this difficulty by reformu- 
lating the problem as a nonlinear optimization among the largest possible dictionary. The 
trade-off is that such decomposition is not unique. We need to exploit the intrinsic sparse 
structure of the data to select the sparsest one among all the possible decompositions. 
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In our method, the dictionary is given as following: 

V = {a(t)cos9{t) : 9'{t) > 0, a(t),9'(t) G V{9)} , (1) 

where V{6) is a linear space consisting of functions smoother than cos 9{t). The construction 
of V{9) with given 9(t) is in general an overcomplete Fourier basis given below: 

V{0) = span 1 1, cos (Jj!-\ , sin (^-\ , k = 1, ■ ■ ■ , 2XL e X , (2) 

where Lq = l_ -^-fer^ J , L/^J i s the largest integer less than /x, and A < 1/2 is a parameter 
to control the smoothness of V{9). We then decompose the signal over this dictionary 
by looking for the sparest decomposition. The sparest decomposition can be obtained by 
solving a nonlinear optimization problem: 

P : Minimize M (3) 

M 

Subject to: f{t) = a-kit) cos 9k(t), afc(i) cos 9k(t) G T>, k = l,- — ,M. 

k=l 

When the signal is polluted by noise, the equality in the above constraint is relaxed to be 
an inequality depending on the noise level. This optimization problem can be viewed as 
a nonlinear version of the L° minimization problem and is known to be very challenging 
to solve. Inspired by the compressed (compressive) sensing theory [2], we propose a l 1 - 
regularized nonlinear matching pursuit method to solve this nonlinear optimization problem. 

Our nonlinear matching pursuit is inspired by the linear matching pursuit method |18|. 
127] . We first extract an intrinsic mode function a(t) cos 9(t) G T> from the signal f(t) by 
looking for the one which matches the signal f(t) best among all the elements in T>. This 
would imply the following nonlinear optimization problem: 

Minimize 7||a(i)|| z i + ||/(i) - a(t) cos 0(t)\\p, Subject to a(t) cos 6(t) G T>, 

where 7 > is a regularization parameter and a(t) is the representation of a(t) in V{9). In 
some cases when the signals are periodic, we can choose 7 = 0. Denote by r(t) the residual 
after subtracting a(t) cos 9(t) from /(£), i.e. r(t) = f(t) — a(t) cos 9(t). We can then treat 
r(t) as a new signal to extract the remaining IMFs. There are two important advantages 
of this nonlinear matching pursuit approach. The first one is that this method is very 
stable to noise perturbation. The second one is that it can be implemented very efficiently. 
In the case of 7 = 0, the resulting method can be solved approximately by Fast Fourier 
Transform, and the complexity of our algorithm is of order O(NlogN) where N is the 
number of data sample points that we use to represent the signal. The low computational 
cost and the robustness to noise perturbation make this method very effective in many 
applications. Moreover, for data that satisfy certain scale separation conditions, we prove 
that our method recovers the IMFs and their instantaneous frequencies accurately. 
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We perform extensive numerical experiments to test the robustness and the accuracy of 
our data-driven time- frequency analysis method for both synthetic data and some real data. 
Our results show that the nonlinear matching pursuit can indeed decompose a multiscale 
signal into a sparse collection of intrinsic mode functions. We also compare our method 
with the original EMD method. For the data without noise, we find that our method gives 
results comparable to those obtained by the EMD method. Moreover, for noisy data, our 
method seems to provide better estimation of the instantaneous frequency and IMFs than 
EMD and recently developed EEMD method [30] . 

A common difficulty in many data analysis methods is the relatively large error produced 
near the boundary of the data set. For the EMD method, this source of error is referred 
to as the "end effect", which is primarily caused by the use of cubic spline interpolation 
in constructing the envelope and the median of the signal. Our data-driven time-frequency 
analysis method seems to be less sensitive to this end effect, especially when the data satisfy 
certain scale separation property. 

We have also extended our data-driven time-frequency analysis method to decompose 
data that do not have a good scale separation property. By incorporating a shape function 
into our dictionary that is adapted to the signal, we can also extend our method to decom- 
pose data with strong intra-wave modulation. Finally, we demonstrate that our data-drive 
time-frequency analysis method can be applied to recover the original signal with missing 
data in certain interval. The recovered signal as well as their instantaneous frequency seems 
to have reasonably good accuracy. We also apply our method to decompose under-sampled 
data. The result is quite encouraging even if the under-sampled data are polluted by noise. 

We remark that there has been some recent progress in developing a mathematical frame- 
work for an EMD like method using synchrosqueezed wavelet transforms by Daubechies, 
Lu and Wu [TJ. This seems to be a very promising approach. We have performed some 
preliminary numerical experiments to compare the performance of our method with the 
synchrosqueezed wavelet approach. In many cases, we find that the two methods give 
comparable and complementary results. We are currently exploring a hybrid approach that 
combines the advantages of our method with those of the synchrosqueezed wavelet approach. 
Our preliminary results seem quite encouraging. We will report this in a forthcoming paper. 

The remaining of the paper is organized as follows. In Section 2, we give a brief review of 
some existing data analysis methods such as the matching pursuit, the basis pursuit and the 
EMD method. We introduce our adaptive data analysis method. In Section 3, a simplified 
version of our data-driven time-frequency data analysis method is introduced for periodic 
data which can be implemented efficiently by using the Fast Fourier Transform. In Section 
4, we present some numerical experiments, include incomplete or under-sampled data, to 
demonstrate the performance of our method. In Section 5, we generalize our data-driven 
time-frequency data analysis method to analyze data with poor scale separation property 
and data with strong intra-wave modulation. We present some preliminary error analysis 
of our data-driven time-frequency data analysis method in Section 6. The technical proof 
of the main result is deferred to the Appendix. Some conclusions are made in Section 7. 
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2 Brief review of the existing sparse decomposition methods 



A considerable focus of activities in the recent signal processing literature has been the 
development of the sparse signal representations over a redundant dictionary. Among these 
methods, the matching pursuit [18] and the basis pursuit [5] have attracted a lot of attention 
in recent years due to the development of the compressed (compressive) sensing. All these 
methods consist of two parts: a dictionary to decompose the signal and a decomposition 
method to select the sparsest decomposition. 

2.1 Dictionaries 

A dictionary is a collection of parameterized waveforms T> = {(/> 7 } 7g r- Many dictionaries 
have been proposed in the literature. Here we review a few of them that have been used 
widely. 

A Fourier Dictionary. A Fourier dictionary is a collection of sinusoidal waveforms. More 
specifically, the waveforms consist of the following two families, 

4>ufl = cos(wt), = sin(u^). (4) 

For the standard Fourier dictionary, uj runs through the set of all cosines with Fourier 
frequencies uj^ = 2kir/n, k = 0, 1, ■ ■ ■ ,n/2, and all sines with Fourier frequencies uj^ = 
2kir/n, k = 1, • • • , n/2 — 1, where n is the number of sample points. We can also obtain an 
overcomplete Fourier dictionary by sampling the frequencies more finely. Let I > 1. We may 
choose Wfc = 2kn/ (In), k = 0, 1, • • • , ln/2 for cosines and uj^ = 2kn/ (In), k = 1, • • • , /n/2 — 1 
for sines. This is an /-fold overcomplete system. In the algorithm for non-periodic data, we 
will use this kind of overcomplete Fourier dictionary. 

A Wavelet Dictionary. A wavelet dictionary is a collection of translations and dilations 
of the basic mother wavelet ip, together with translations of the scaling function <p defined 
below: 

4>a,b,o = -7=^ ( -J , 4>a,b,i = -p<£ ( - — -J • (5) 

y/a \ a J yja \ a J 

For the standard wavelet dictionary, we let a, b run through the discrete collection of mother 
wavelets with dyadic scales a,j = 2 3 jn, j = jo, • • • , log 2 (n) — 1, and locations that are integer 
multiples of the scale bj t k = kaj, k = 0, • • • , 2 3 — 1, and the collection of scaling functions 
at the coarse scale jq. This dictionary consists of n waveforms, which form an orthonormal 
basis. As in the Fourier dictionary, an overcomplete wavelet dictionary can be obtained by 
sampling the locations more finely. 

A Time-Frequency Dictionary. A typical time-frequency dictionary is the Gabor dic- 
tionary due to Gabor (1946). In this dictionary, we take 7 = (uj, t, 9, 5), where uj £ [0, it) is 
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frequency, r is a location, 8 is a phase, and 5 is the duration. We define the waveform as 
follows: 



Such waveforms consist of frequencies near u and essentially vanish far away from r. 

An EMD Dictionary. We can also define a dictionary via the EMD method. In the EMD 
method [14j . the dictionary is the collection of all Intrinsic Mode Functions (IMF), which 
are functions defined descriptively by enforcing the following two conditions: 

1. The number of the extreme and the number of the zero crossings of the function must 
be equal or differ at most by one; 

2. At any point of the function, the average of the upper envelope and the lower envelope 
defined by the local extreme should be zero (symmetric with respect to zero). 

Inspired by the EMD method, we will use a variant of the EMD dictionary to construct 
a sparse decomposition of a signal via nonlinear optimization. 

2.2 Decomposition Methods 

In this subsection, we review a few decomposition methods that can be used to give a sparse 
decomposition of a signal by exploiting the intrinsic sparsity structure of the signal. In recent 
years, there have been a lot of research activities in looking for the sparest representation 
of a signal over a redundant dictionary [181 EJ EJ EJ H] , i.e. look for a decomposition of a 
signal / over a given dictionary V = {</> 7 } 7 er as 



with the smallest m, where R m is the residual. Whether or not a signal can be decomposed 
into a sparse decomposition depends on the choice of the dictionary that we use to decom- 
pose the signal. In general, a more redundant dictionary tends to give better adaptivity, 
which implies better sparsity of the decomposition. However, when the dictionary is highly 
redundant, the decompositions are not unique. We need to give a criterion to pick up the 
"best" decomposition among all the possible choices. 

Matching Pursuit. In [18], Mallat and Zhang introduced a general decomposition method 
called the matching pursuit that exploits the sparsity of a signal. Starting from an initial 
approximation s° = and a residual r° = s, the matching pursuit builds up a sequence 
of sparse approximations step by step. At stage k, the method identifies an atom that 
best matches the residual and then adds it to the current approximation, so that s k = 
s k ~ 1 + afc0 7 fe, where =< r , (j)^ > and r k = s — s k . After m steps, one has a 





rn 




(7) 



k=l 
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representation of the form ([7|), with residual R m = r m . A similar algorithm was proposed 
for Gabor dictionaries by S. Qian and D. Chen [24j . 

An intrinsic feature of this algorithm is that when stopped after a few steps, it yields 
an approximate sparse representation using only a few atoms. When the dictionary is 
orthogonal, the method works perfectly. If the dictionary is not orthogonal, the situation 
is less clear. Recently, J. Tropp and A. Gilbert proved that under some assumptions on the 
basis, the orthogonal matching pursuit can solve the original Iq minimization problem |27| . 

Basis Pursuit. Another important class of decomposition methods is the basis pursuit, 
which was introduced by S. Chen, D. Donoho and M. Saunders [5]. First, we reformulate 
the decomposition problem in the following way. Suppose we have a discrete dictionary of 
p waveforms and we collect all these waveforms as columns of an n by p matrix The 
decomposition problem (|7|) can be reformulated as: 

s = *a, (8) 

where a = (a 7 ) is the vector of coefficients in ([7]). 

The basic idea of the basis pursuit is to find a sparse representation of the signal whose 
coefficients have a minimal l\ norm, i.e. the decomposition is obtained by solving the 
problem 

min ||a[|/i, subject to <&a = s. (9) 

Recently, the basis pursuit has received a lot of attention, since it is found that under 
some conditions the basis pursuit can recover the exact solution of the original Iq minimiza- 
tion problem [31 18]. There has been extensive research to obtain a sparse representation by 
the basis pursuit in a variety of applications. An essential component of the basis pursuit is 
to solve the I 1 minimization problem. The computational cost of solving this I 1 minimiza- 
tion is more expensive than the least-square problem in the matching pursuit, although a 
powerful Split Bregman method has been introduced by Goldstein and Osher to speed up 
the I 1 minimization problem considerably |12| . 

The EMD decomposition via a sifting process. The EMD method decomposes a 
signal to its IMFs sequentially. The basic idea behind this approach is the removal of the 
local median from a signal by using a sifting process. Specifically, for a given signal, f(t), 
one tries to decompose it as a sum of the local median m(t), and an IMF. A cubic spline 
polynomial is used to interpolate all the local maxima to obtain an upper envelope, and to 
interpolate all the local minima to obtain a lower envelope. By averaging the upper and 
lower envelopes, one obtains an approximate median for m(t). One then decides whether or 
not to accept the obtained m(t) as our local median depending on whether f(t) — m(t) gives 
an acceptable IMF that satisfies the two conditions that are specified in the definition of an 
EMD dictionary. If f(t) — m(t) does not satisfy these conditions, one can treat f(t) — m(t) 
as a new signal and construct a new candidate for the IMF by using the same procedure 
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described above. This sifting process continues until we obtain a satisfactory IMF, which 
we denote as f n (t)- Now we can treat fit) — f n {t) as a new signal, and apply the same 
procedure to generate the second IMF, _f n _i(i). This procedure continues until fo(t) is 
either monotone or contains at most one extremum. For more details of the sifting process, 
we refer to [II] . 

Decomposition based on a nonlinear TV 3 minimization. Inspired by the EMD 
method, we proposed a decomposition method based on a nonlinear TV 3 minimization in 
our previous paper [13] . Here TV 3 is the total variation of the third order derivative of 
a function, defined as TV 3 (f) = J a \ f^(t)\dt. We use a TV 3 norm because the L 1 norm 
or the total variation norm is not strong enough to enforce the regularity of the median 
or the envelope of our decomposition. For example, the use of the total variation norm, 
which is very popular in imaging processing community, tends to give a decomposition 
whose median or envelope is piecewise constant, which is referred to as the stair-case effect. 
The TV 3 norm, on the other hand, gives a much smoother decomposition for both the 
median and the envelope. Incidentally, the minimization using the TV 3 norm tends to 
favor piecewise cubic polynomials such as cubic splines. Thus, our method gives results 
that are qualitatively similar to those obtained by the EMD method which uses cubic 
splines to construct its median and envelope from the local extrema of the signal. 

We now give a brief review of our TV 3 decomposition method. In our approach, every 
element in our dictionary automatically satisfies the conditions of IMF. There is no need 
to do any sifting or use the Hilbert transform in our method. First, we decompose a signal 
f(t) into its local median ao and an IMF aicos9(t) by solving the following nonlinear 
optimization problem: 

(P) Minimize TV 3 (a ) + TV 3 ( ai ) , (10) 
Subject to: a (t) + ai(t) cos 0(t) = f(t), 0'(t) > 0. 

To solve this nonlinear optimization problem, we proposed the following Newton type of 
iterative method: 

Initialization: 9° = 9q. 

Main Iteration: 

Step 1: Update tig, a™, b\ by solving the following linear optimization problem: 

Minimize TF 3 (a[f) + TU 3 «) + TV 3 ^), (11) 
Subject to : a£ + a? cos O 71 ' 1 ^) + b\ sinfl"-^*) = f(t). (12) 

Step 2: Update the phase function 9: 

9 n = 0n-l_ Maxctan Q|j ) (13) 
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where [i S [0, 1] is chosen to enforce that 9 n is an increasing function: 

fi = max | a e [0, 1] : (o^' 1 - a arctan (J^j ^ > J . (14) 

Step 3: If ||6» n - 9 n ~ 1 \\ 2 < e , stop. Otherwise, go to Step 1. 

In |13| . we performed a number of numerical experiments and compared the results with 
those obtained by the EMD (or EEMD) method. Our results show that this method shares 
many important properties with the original EMD method. Moreover, its performance does 
not depend on numerical parameters such as the number of sifting or the stop criterion, 
which seem to have a major effect on the original EMD method. 

There are two limitations of this approach. The first one is that the computational cost 
to solve the TV 3 minimization problem is relatively high, even if we use the Split Bregman 
method of Goldstein and Osher |12j . The second one is that this method is more sensitive 
to noise perturbation, although a nonlinear filter was introduced to alleviate this difficulty. 
In comparison, the nonlinear matching pursuit method we introduce in this paper is very 
stable to noise perturbation and has a relatively low computational cost. 



3 Sparse time- frequency decomposition method based on non- 
linear matching pursuit 

Our adaptive data analysis method is based on finding the sparsest decomposition of a 
signal by solving a nonlinear optimization problem. First, we need to construct a large 
dictionary that can be used to obtain a sparse decomposition of the signal. In principle, 
the larger the dictionary is, the more adaptive (or sparser) the decomposition is. 

3.1 Dictionary 

In our method, the dictionary is chosen to be: 

V = {a(t)cos9(t) : 0'{t) > 0, a(t),6'(t) is smoother than cos0(i)} . (15) 

Let V{9) be the collection of all the functions that are smoother than cos 9{t). In general, 
it is most effective to construct V{9) as an overcomplete Fourier basis given below: 



V{9) = span 1 1, cos (j^j ,sin^j , k =!,-■■ ,2AL<? j 



(16) 



where Lg = [ ffi^L-ffiQJ j ; [^J is the largest integer less than fx, and A < 1/2 is a parameter 
to control the smoothness of V(ff) . Then D can be written as 

V = {a(t)cos9{t) : 9'{t) > 0, a{t),9'(t) € V{9)} . (17) 
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In some sense, the dictionary V denned above can be considered as a collection of IMFs. 
This property makes our method as adaptive as the EMD method. We also call an element 
of T> as an IMF. Since the dictionary T> is highly redundant, the decomposition over this 
dictionary is not unique. We need a criterion to select the "best" one among all possible 
decompositions. We assume that the data we consider have an intrinsic sparse structure in 
the time-frequency plane in some nonlinear and nonstationary basis. However, we do not 
know this basis a priori and we need to derive (or learn) this basis from the data. Based 
on this consideration, we adopt sparsity as our criterion to choose the best decomposition. 
This criterion yields the following nonlinear optimization problem: 

P : Minimize M (18) 

M 

Subject to: fit) = a k (t) cos 9 k (t), a k (t) cos 0k(t) € T), k = l,---,M, 
k=i 

or 

P s : Minimize M (19) 

M 

Subject to: \\f(t) - ^ a k (t) cos 9 k (t)\y < S, a k {t) cos 9 k (t) € T>, k = l,---,M, 
k=l 

if the signal has noise with noise level 6. 

After this optimization problem is solved, we get a very clear time-frequency represen- 
tation: 

Frequency: co k (t) = 6 k (t), Amplitude : a k (t). (20) 



3.2 Nonlinear matching pursuit 

The above optimization problem can be seen as a nonlinear L° minimization problem. 
Thanks to the recent developments of the compressed (compressive) sensing, it is now well 
known that there are two kinds of methods to solve a L° minimization problem: the basis 
pursuit and the matching pursuit. Since the dictionary we adopt here has infinitely many 
elements, the generalization of the basis pursuit is not so straightforward. But the idea of 
the matching pursuit can be generalized. Applying the idea of the matching pursuit to our 
problem, we obtain the following algorithm: 

• r° = f(t). 

Step 1: Solve the following ^-regularized nonlinear least-square problem (P2): 

P2 : Minimize 7||afc||/i + ||r fc_1 — a k cos6 k \\p (21) 

Subject to: 0' k > 0, a k {t) G V(9 k ), . 

where 7 > is a regularizing parameter and a k is the representation of a k in the V(9 k ) 
space. 
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Step 2: Update the residual 

k 

f Jl = f(t)-^2a j (t)co S e j (t). (22) 

3=1 

Step 3: If \\r k \\i2 < eo, stop. Otherwise, go to Step 1. 

In the first step of the above iterative algorithm, unlike the standard matching pursuit 
, we solve a I 1 regularized nonlinear least-square problem Pi- To solve this problem, we use 
the following Gauss-Newton type method: 

• 01 = O . 

Step 1: Solve the following l\ regularized least-square problem: 

P 2 , l2 Minimize j(\\a n k +1 h + l^lliO + llr*" 1 - af 1 (t) cos ff£{t) - b n k +1 (t) sin 9 n k {t)\\l 

Subject to a n k +1 (t), b n k +l {t) £ V{6%). 

where a k +1 ,b k +1 are the representations of a k +1 ,b k +1 in the V(9 k ) space. 
Step 2: Update 0£, 

^ +1 = ^-Aarctan^|^^ , (23) 
where AG [0, 1] is chosen to make sure that 6l +1 is a monotonely increasing function. 

A = max { a e [0, 1] : [ 0£ - a arctan [ J J > ol . (24) 



dt 



Step 3: If ||6»^ +i - 0%\\p < e , stop. Otherwise, go to Step 1. 

In the first step of above algorithm, we solve a l\ regularized least square problem. The 
l l regularization tends to stabilize the least square problem using an overcomplete Fourier 
basis. It also favors a sparse decomposition of the data. 



3.3 A fast algorithm for periodic data 

In the iterative algorithm given in previous section, we need to solve a l\ regularized least 
square problem in each step. This is the most expensive part of the algorithm especially 
when the number of the data points is large. In this subsection, we consider the special 
case when the data are periodic. In this case, we can introduce an method based on Fast 
Fourier Transform(FFT) instead of solving the l\ regularized least square problem. 
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One big advantage for periodic data is that we can use the standard Fourier basis to 
construct the V{9) space in the following way instead of the overcomplete Fourier basis 
given in ([2]). 

V{9) = span jcos (j^j ,sin (j^j : k = 0,---,\L e , I = l,---,XL e \ , (25) 

where A < 1/2 is a parameter to control the smoothness of functions in V{9) and Lg = 
(0(1) — #(0))/27r is a positive integer. 

The standard Fourier basis is orthogonal to each other, so the l\ regularized term is not 
necessary in this case. Then, the least-square problem that we need to solve in the iterative 
algorithm is described below (recall that we set 7 = 0): 

Minimize \\r(t) - a(t) cos 6 (t) - b(t) sin0(i)||f 2 (26) 
Subject to a(t), b(t) £ V{6). 

In order to simplify the notations, we drop the subscript and superscript here. 

Notice that in the iterative process, the derivative of the phase function 6(t) is always 
monotonically increasing. Thus, we can use 0(t) as a new coordinate. In this new coordinate, 
cos 9, sin 9 and the basis of V{9) are simple Fourier modes, then the least-square problem 
can be solved by using the Fourier Transform. More specifically, for a given increasing 
phase function 9(t), we have the following algorithm to solve the least-square problem (|26p 
approximately: 

Step 1: Apply an interpolation method to obtain the value of r(t) at a uniform mesh in the 
^-coordinate, rg(9j): 

r e (9 j ) = Interpolate (9(U),r, 9j) , (27) 
where 6j, j = 1, ■ ■ ■ , N are uniformly distributed in the 0-coordinate. 

Step 2: Apply the low-pass filter x{k) to the Fourier Transform of rg to compute the envelope 
a(t) and b(t): 

a(t) = 2Re{T- 1 [? e (k + l)x(k)}}, (28) 
b(t) = 2Im{T- 1 [re(k + l)x(k)}}, (29) 

where x(k) will be given later. 

The low-pass filter x(k) is determined by the choice of V{9). The definition of V(9) in 
(I25p implies that x(k) a stair function given as following: 

X(k) = { 1 > ~ X < k<X (30) 
0, otherwise. 
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In the theoretical analysis in the subsequent section, we will see that the stair function 
is not a good choice as a filter. We can define a different space for V{9) by choosing an 
appropriate x(A:). This opens up many choices for V{9). In this paper, we choose the 
following low-pass filter x{k) to define V{9): 

J 1 + cosvrA;, -1 < k < 1 
X{k) = \ 0, otherwise. (31) 

Now, by incorporating the FFT-based solver in our iterative algorithm, we get the 
following FFT-based iterative algorithm: 

• ol = e . 

Step 1: Interpolate r(t) to a uniform mesh in the ^"-coordinate to get rgn(6>"): 

rgn (9J) = Interpolate (6 n (U), r, 9]) , (32) 
where 6>", j = 1, • • • , N are uniformly distributed in the ^"-coordinate. 

Step 2: Apply to the Fourier Transform of rgn to compute a" +1 and b n+1 on the mesh 
of the ^"-coordinate: 

a n+ \9 n ) = 2Re{F- 1 [?e4k + l)x(k)]}, (33) 
b n+1 (9 n ) = 2Im{F- 1 [^(k + l)x(k)]}. (34) 

Step 3: Interpolate a n+l {6 n ) and b n+l {6 n ) back to the uniform mesh of t: 

a n+1 (ti) = Interpolate (0?, a n+1 {9]),t i ) , (35) 
b n+1 (ti) = Interpolate (9J, b n+l {9]) J t i ) . (36) 

Step 4: Update 9 n in the t-coordinate: 

^^-Aarctanf-^J , (37) 

where A 6 [0, 1] is chosen to make sure that 9^ +1 is monotonically increasing: 

A = max jae [0,1] : ^ ^ - a arctan ^~S+T j j - °| ■ ( 38 ) 

Step 5: If \\9^ +l - 0% || 2 < e , stop. Otherwise, go to Step 1. 
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In our implementation, instead updating the phase function 9, we update the instanta- 
neous frequency 6'(t): 

n n+l l h n+l\' _ h n+X f_r 

(e n k +l )' = (9 n k y + \A6 n k +1 , A0 n k +l = fc [bk -4 k !* 

The phase function can be reconstructed from the instantaneous frequency by integration. 
The integration constant is set to since this constant does not change the instantaneous 
frequency. 

In the formula (f39|) to update 9, at the points where the denominator + (b^. +1 ) 

is small, the error maybe amplified, then our algorithm may become unstable. In the real 
computation, we first set a threshold value a, at the points where the denominator is smaller 
than a, the value of A9 is interpolated by the value A9 at other points, its original value is 
not used. In the computations of this paper, a is set to be 0.1. 

Above algorithm is based on the Gauss-Newton type iteration. It is known that this 
kind of iterations is sensitive to the initial value. It is not practical to assume that we can 
get a good initial guess especially when the signal is polluted by noise. In order to abate 
the dependence on the initial guess, we use the following procedure to gradually improve 
our approximation to the phase function so that it converges to the correct value. 

First, for a given phase function 9, we define a set of spaces, such that 

Vi{9) c7j(«)C-C V K {9) = V(9) (40) 

In the computation, we first limit A9 in corresponding V\{9) space, do the iteration until 
converge, then relax the restriction on A9 to make it in V^(^), repeat the iteration until 
converge. Repeat this process until the algorithm converge with the restriction of A9 in 
V{9). In the real computations, this process converge even from very rough initial guess 
although we can not prove that. 

This procedure is also very easy to implement. First, we apply a narrow low-pass filter 
on AO, then make the filter broader and broader until it approach the filter given in (|3ip . 

In the above algorithm, we need to perform the Fourier Transform. In general, this 
works well only for periodic data. For a non-periodic signal with good scale separation 
property, we can extend the signal by a mirror reflection and treat the extended signal as a 
periodic signal. The result is still quite reasonable. But for nonperiodic data with poor scale 
separation property, the ^-regularized nonlinear matching pursuit is necessary to reduce 
the end effect, as we will see in Sections 4 and 5. 

The initial guess of 9 can be generated by other time-frequency analysis methods, such as 
the synchrosqueezed wavelet transforms [TJ. In the following numerical examples, we obtain 
our initial guess using a simple approach based on the Fourier Transform. More precisely, by 
estimating the wavenumber by which the high frequency components are centered around, 
we can obtain a reasonably good initial guess for 9. The initial guess for 9 obtained in this 
way is a linear function. As we will see in the following numerical examples, even with these 
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(39) 



relatively rough initial guesses for 6, our algorithm still converges to the right answer with 
accuracy determined by the noise level. 

Last point we want to remark is that our method has a close connection to the wavelet 
transform. In some sense, our method is equivalent to employ wavelet transform in the 
^-coordinate in each step. The low-pass filter x(k) used in our method can be related with 
the scale function in the multiresolution analysis. And space V(9) can also be constructed 
following the idea of multiresolution analysis. The relation between our method and the 
wavelet transform will be studied more systematically in our future papers. 

4 Numerical results 

In this section, we will perform extensive numerical studies to demonstrate the effectiveness 
of our nonlinear matching pursuit method. First we will present numerical results for the 
FFT-based algorithms for periodic data or data with a good scale separation property (see 
section [6] for the definition of the scale separation property). In the second section, we 
will present numerical results for the I 1 regularized nonlinear matching pursuit which gives 
reasonably accurate decompositions for non-periodic data and even for under-sampled data 
or data with missing information in some physical domain. 

4.1 Numerical results for the FFT-based algorithms 

In this section, we will present a number of numerical experiments to demonstrate the 
accuracy and robustness of our method. We also compare the performance of our method 
with that of EMD or EEMD. A main focus of our numerical study is the robustness of 
the decomposition to signals that are polluted with a significant level of noise. When the 
signal is free of noise, we observe that the performance of our method is comparable to 
that of the EMD/EEMD method. However, when the signal is polluted by noise with a 
significant noise-to-signal ratio, our nonlinear matching pursuit method tends to give better 
performance than that of the EMD/EEMD method. 

Throughout this section, we denote X(t) as white noise with zero mean and variance 
a 2 = 1. The Signal-to-Noise Ratio (SNR, measured in dB) is defined by 



We will apply our method to several different signals with increasing level of difficulty. 

Example 1 The first example that we consider is a simple non-stationary signal con- 
sisting of a single IMF, which is given below 



In Fig. [TJ we plot the original signal on the left column and the instantaneous frequency 
on the right column. The curve with the red color corresponds to the exact instantaneous 




(41) 



f(t) = cos(60vrf + 15sin(2vrt)). 



(42) 
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frequency and the one with the blue color corresponds to the one obtained by by our 
method. The top row corresponds to the original signal without noise, f(t). The middle row 
corresponds to the same signal with a moderate noise level (f(t)+X(t), SNR=— 3.01dB) and 
the bottom row corresponds to the signal with large noise (f(t) + 3X(t), SNR=— 12.55dB). 
In the case when no noise is added, the instantaneous frequency obtained by our method 
is almost indistinguishable from the exact one, see the first row of Fig. [TJ When the signal 
has noise, our method can still extract the instantaneous frequency and corresponding IMF 
with reasonable accuracy, see Fig. Q] and Fig. EJ 

In Fig. El we compare the IMFs extracted by our method with those obtained by the 
EMD/EEMD method. For the signal without noise, we use the EMD method to decompose 
the signal. For the signal with noise, we use the EEMD method to decompose the signal. 
In the EEMD approach, the number of ensembles is chosen to be 200 and the standard 
deviation of the added noise is 0.2. In each ensemble, the number of sifting is set to 8. Even 
though the signal we consider has only a single IMF, the EEMD method still produces 
several IMFs. Among different components of IMFs that are produced by the EEMD 
method, we select the one that is closest to the exact IMF in I 2 norm and display it in Fig. 

El 

When the signal does not have noise, both our method and the EMD method produce 
qualitatively the same result for this simple signal, see the first row of Fig. El When noise 
is added, the situation is quite different. The IMFs extracted by our method still have 
reasonable accuracy. However, the IMF decomposed by EEMD fails to capture the phase 
of the exact IMF in some region. As a consequence, the accuracy of the instantaneous 
frequency obtained by the EEMD method is very poor (not shown here). 

Example 2 

Now, we consider a signal that consists of three IMFs. 

fit) = —. cos(60vrt + 15 sin(2vrt)) H l — cos(160vrt + sin(16vrt)) 

J w 1.5 + cos(2vrt) v v 1.5 + sin(2vrt) v 1 

+(2 + cos(8vri)) cos(140vr(t + l) 2 ). (43) 

In Fig. [3l we study the accuracy of the instantaneous frequencies obtained by our method 
with the exact instantaneous frequencies. The upper row corresponds to the signal without 
noise. As we can see, it is hard to tell any hidden structure from this signal even without 
noise. Our method recovers the three components of the instantaneous frequencies (blue) 
that match the exact instantaneous frequencies (red) extremely well. They are almost 
indistinguishable from each other. In the case when noise is added to the original signal, 
the polluted signal looks really complicated and one can not recognize any hidden pattern 
from this polluted signal. It is quite amazing that our method could still recover the three 
components of the instantaneous frequencies with accuracy comparable to the noise level, 
see the bottom row of Fig. [3l 
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Figure 1: Top row: left: the original signal defined by (|42p without noise; right: Instan- 
taneous frequencies; red: exact frequency; blue: numerical results. Middle row: the same 
as the top row except white noise X(t) is added to the original signal, the corresponding 
SNR is —3.01 dB. Bottom row: White noise 3X(t) is added to the original signal, the 
corresponding SNR is —12.55 dB. 
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Figure 2: The IMFs extracted by our method and EMD/EEMD method. Left column: 
IMFs extracted by our method; Right column: IMFs obtained by EMD/EEMD method. 
Top row: IMFs from f(t); Middle row: IMFs from f(t) + X(t); Bottom row: IMFs from 
f(t) + 3X(t). f(t) is defined in g2J). 
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Figure 3: Upper row: left: the signal denned in (|43p without noise; right: Instantaneous 
frequencies; red: exact frequencies; blue: numerical results. Lower row: the same as the 
upper row except that white noise 2X(t) was added to the original signal, the corresponding 
SNR is -0.8 dB. 
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Figure 4: The daily Length-of-Day data from Jan 20, 1962 to Jan 6, 2001. 
Real Data: Length-of-Day Data 

Next, we apply our method to the Length-of-Day data, see Fig. [H The data we adopt 
here was produced by Gross [10J, covering the period from 20 January 1962 to 6 January 
2001, for a total of 14,232 days (approximate 39 years). In our previous paper [13] , we 
also studied this data set. Due to the high computational cost associated with the I 
minimization, we can not decompose the entire data set. Instead, we decompose a segment 
of the data that contains 700 consecutive days. Thanks to the low computational cost of 
the FFT-based nonlinear matching pursuit method, we can now study the entire data set 
without any compromise. 

Fig. [5] displays the first 5 IMFs extracted by the FFT-based method. These IMFs 
are sorted by their frequencies from high to low. We note that the results obtained by 
our method do not suffer from the mode mixing phenomenon that is present in the EMD 
decomposition. Moreover, each component is enforced to be an IMF by the construction of 
our dictionary. Thus, there is no need to do shifting or post-processing as was done in the 
EMD or the EEMD method. And the IMFs we got are qualitatively match those obtained 
by EEMD with post-processing [30J. 
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ure 5: The first 5 IMFs with highest frequencies given by our FFT-based method. 
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It is interesting to note that the each IMF that we obtain has a clear physical interpre- 
tation. For example, the period of C\ is around 14 days, corresponding to the semi-monthly 
tides. The period of C2 is about 28 days, corresponding to the monthly tides. Similarly, the 
period of C4 is about half a year, corresponding to the semi-annual cycle and C5 corresponds 
to the annual cycle. 

4.2 Numerical results for the I 1 regularized nonlinear matching pursuit 

The nonlinear matching pursuit method based on the Fourier Transform works well only 
for periodic data. For non-periodic data or data with poor scale separation, the results 
obtained by this method tend to produce some oscillations near the boundary. This so- 
called "end effect" is also present in the EMD method and other data analysis methods. In 
our method, the "end effect" comes from the use of the Fourier Transform in the algorithm. 
To remove this "end effect" error, we need to use the I 1 regularized nonlinear matching 
pursuit described in Section 3.2 with V{9) being the overcomplete Fourier basis defined in 



4.2.1 Numerical results for non-periodic data 

In this subsection, we perform a numerical experiment to test the effectiveness of our I 1 
regularized nonlinear matching pursuit for non-periodic data. We first consider the following 
data in our experiment: 

6»i = 20vr(t + l) 2 + 1, 6 2 = 161.4vri + 4(1 - t) 2 sin(16vrt), 

/(*) = : =- 1 n , T, + (2* + 1) cos 6 l + (2 - tf cos 9 2 . (44) 
1.5 + sm(1.57rij 

In this numerical example, the parameter 7 is chosen to be 1. From Fig. [61 we observe that 
the I regularized nonlinear matching pursuit seems to produce considerably smaller error 
near the boundary for this non-periodic signal. 

4.2.2 Numerical results for data with incomplete or sparse samples 

The l\ regularized nonlinear matching pursuit can also handle the incomplete data and 
the data with sparse samples. We illustrate this property of our method through a few 
examples. 

The first example is an incomplete signal given by (JJ5]). 



6{t) = 120vrt + 10cos(47rt), a(t) = 2 + cos(2vri), f(t) = a(t) cos 9{t), t £ [0, 0.4] U [0.6, 1]. (45) 

For this signal, we have only eighty percent of the original data and miss twenty percent 
of the data in the gap interval [0.4,0.6]. In Fig. [71 we plot the recovered signal in the gap 
interval [0.4,0.6] (see the middle panel). The recovered signal matches the original signal 
almost perfectly in the gap interval. The recovered instantaneous frequency also matches 
the exact instantaneous frequency with high accuracy. 
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Figure 6: IMF (left) and Instantaneous frequency (right) of the signal in (|44p obtained from 
different methods. Red: exact; Blue: I 1 regularized nonlinear matching pursuit; Black: 
FFT-based algorithm. 

In Fig. [8J we perform the same numerical experiment by enlarging the interval of 
missing data from (0.4,0.6) to (0.3,0.7), i.e. we miss forty percent of the data. Even for 
this more challenging example, our method still gives quite reasonable reconstruction of 
the original data in the region of missing data. The recovered instantaneous frequency still 
approximates the exact instantaneous frequency with reasonable accuracy, especially away 
from the region of missing data. 

Finally, we consider an example with insufficient samples. The signal is generated as 
follows: 

9(U) = UOirti + 10cos(2vrti), afc) = 2 + cos(2vrii), f{U) = afc) cos 9{U), U G [0,1]. (46) 

and i = 1,2, • • • , N . The location ti is chosen randomly in [0,1]. In this example, the 
number of samples is 64. This means that we have about one sample point within one 
period of the signal on average. 

Fig. [9] gives the results obtained by our method. In the case of insufficient samples 
without noise, the recovered signal and the original signal are almost indistinguishable. 

We now add Gaussian noise to the original samples and apply our method to this noisy 
data. The result is given in Fig. [TUJ In this case, the noise 0.2X(t) is added to the origianl 
signal f(t) given in (146 j) . We can see that both the recovered signal and the instantaneous 
frequency still have reasonable accuracy. This shows that our method is stable with respect 
to noise perturbation even for sparse under-sampled data. 

In the future, we plan to perform some theoretical study of our method for data with in- 
complete or sparse samples and carry out more numerical experiments for some complicated 
or real data. 
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Figure 7: Left: blue: the original incomplete data, the gap is (0.4, 0.6); red: the missing data 
recovered by our method; Middle: the recovered missing data, red: exact; blue: numerical. 
Right: the instantaneous frequencies, red: exact; blue: numerical. 
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Figure 8: Left: blue: the original incomplete data, the gap is (0.3, 0.7); red: the missing data 
recovered by our method; Middle: the recovered missing data, red: exact; blue: numerical. 
Right: the instantaneous frequencies, red: exact; blue: numerical. 
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Figure 9: Left: original samples, red: exact; blue: recovered; '*' represent the sample points. 
Right: instantaneous frequency, red: exact; blue: numerical. 
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Figure 10: Left: original samples, red: exact; blue: recovered from the noised data, f{U) + 
0.2X (ti); '*' represent the sample points. Right: instantaneous frequencies, red: exact; 
blue: numerical. 
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5 Generalizations for the / regularized nonlinear matching 
pursuit 

The iterative algorithm based on I 1 regularized nonlinear matching pursuit can also be 
generalized to deal with more complicated data, such as the data with poor scale-separation 
and the data with intra-wave frequency modulation. In this section, the power of this kind 
of algorithm is shown through several numerical examples, more details can be presented 
in our subsequent papers. 

5.1 Numerical results for data with poor scale-separation property 

In the previous sections, we show that for data with a good scale-separation property, 
the algorithm based on I 1 regularized nonlinear matching pursuit can give an accurate 
decomposition. Now, we give a brief discussion how to decompose data with poor scale- 
separation property. 

Let / be a signal that has the following sparse decomposition over dictionary T>: 

M 

f(t) = ^JafcCOs6>&, a k cos9k£V, (47) 

k=l 

where T> is defined in (|17j) . But now the instantaneous frequencies 0' k (t) are not well sepa- 
rated, so f(t) does not satisfy the scale-separation condition defined in Section [6] 

It is well known that for data consisting of components with interfering frequencies, the 
matching pursuit with a Gabor dictionary may not give a sparse decomposition [19]. Since 
our method is based on the matching pursuit, it is not surprising that it may not be able 
to generate the sparsest decomposition either. 

To illustrate, we consider the following signal consisting of two IMFs whose instanta- 
neous frequencies intersect each other. The signal is generated by the analytical formula 
given below. 

f(t) = cos(20vrt + 40vri 2 + sin(2vrt)) + cos(40vrt). (48) 

Fig. [11] plots the instantaneous frequencies and IMFs recovered by the nonlinear match- 
ing pursuit given in the previous section. Near the point of intersection, both the instanta- 
neous frequencies and IMFs produce noticeable errors. The good news is that the instanta- 
neous frequency recovered by our method is still in phase with the exact one. Furthermore, 
the accuracy is quite reasonable in the region far away from the point of intersection. This 
shows that our method has a temporal locality property, which is important in many phys- 
ical applications. 

To further improve the accuracy of our decomposition when there are a number of instan- 
taneous frequencies that are not well separated, we need to decompose these components 
simultaneously since these IMFs have strong correlation. Assume that we can learn from 
our I 1 regularized nonlinear matching pursuit that there are M components of IMFs whose 
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Figure 11: Left: Instantaneous frequencies; red: exact frequencies; blue: numerical results. 
Middle and Right: IMFs extracted by the previous nonlinear matching pursuit. 



instantaneous frequencies are not well separated, we modify our decomposition method to 
solve the following optimization problem: 



M 



M 



mm 7 



WakWii + \\f(t) - y^afccosgfcllp 1 , s.t. a k cos9 k eV, 



(49) 



fc=i 



k=l 



where 7 > is a regularized parameter and a k is the representation of a k in V(9k) space. 

This problem is much more difficult to solve than the original one, since the different 
components may have strong correlation. Based on the I 1 regularized nonlinear matching 
pursuit that we introduced in the previous sections, we have developed a new method to 
solve the above optimization problem. The detail of this method will be reported in our 
subsequent paper. Here we give an example to demonstrate that this new method has the 
capability to deal with the signal with poor scale separation. 

Fig [12] gives the results obtained by our new method for the signal given in ([48 p . We 
can see that both the instantaneous frequencies and IMFs match the exact ones pretty well. 
These results are much better than those given in Fig. [TTJ 

In the previous example we consider, the scale separation is destroyed near the region 
where instantaneous frequencies of two different IMFs intersect each other. In the following 
example, we consider another example with poor scale separation, but for a different reason. 
In this case, the frequencies of different IMFs are well separated, but the instantaneous 
frequency of an IMF is not smooth any more. Specifically, we consider the following example: 



/ = 

1.5 + cos 2irt 

40tt, < t < 0.3, 




60vr, 0.3 < t < 1 



+ cos 6»i + cos 2 , 0i (0) = 6» 2 (0) = 0, 

, _ f 140tt, < t < 0.6, 
2 ~ \ 160vr, 0.6 < t < 1. 



(50) 



In this example, the instantaneous frequencies are discontinuous at t = 0.3 and t = 0.6 
respectively. Even for such signal, our I 1 regularized nonlinear matching pursuit method 
still gives a reasonable decomposition even if the signal is polluted with noise. 
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Figure 12: Left: Instantaneous frequencies; Middle (first component) and right (second 
component): IMFs obtained by extracting two IMFs together, red: exact results; blue: 
numerical results. 
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Figure 13: Instantaneous frequencies and IMFs given by the I regularized nonlinear 
matching pursuit for the data given in (150)) . Left: Instantaneous frequencies; Right: IMFs 
obtained by our method, red: exact results; blue: numerical results. 
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Figure 14: Instantaneous frequencies and IMFs given by I regularized nonlinear matching 
pursuit for the data f(t) + 0.3X(t), where / is given in (|50p and X(t) is Gaussian noise 
with standard derivation a 2 = 1. Left: Instantaneous frequencies; Right: IMFs obtained 
by our method, red: exact results; blue: numerical results. 

As we can see from Fig. \13\ there are some oscillations due to the Gibbs phenomenon 
near the points of discontinuity (t = 0.3,0.6). IMFs also have some errors near these two 
points. But in the region away from these two points, the numerical results match very well 
with the exact ones. These results also suggest that our I 1 regularized nonlinear matching 
pursuit has temporal locality property. The error is confined in a small region near the 
points where scale separation is poor. This property will be discussed further in Section 
6. When the noise is added to this signal, the results we obtain still have a reasonable 
accuracy, see Fig. [T51 

5.2 Numerical results for data with intra-wave frequency modulation 

In some physical applications such as the stokes waves or some nonlinear dynamic systems, 
we need to deal with data that have intra-wave frequency modulation. For this type of 
signals, they have the sparse decomposition: 



where afc are smooth envelopes, but their instantaneous frequencies, 9' k , are not smooth 
any more. Typically, the phase function has the form 0k = 4>k + e cos (ujk4>k), where 4>k 
is a smooth function, e > is a small number and uik > 1 is an integer. In the study of 
some nonlinear waves or dynamical systems, uik is an important parameter related to the 
characteristic of the nonlinearity of the system. 

An essential difficulty for this type of data is that the instantaneous frequency, 8' k , is 
as oscillatory as cos 9^ or even more oscillatory. In the nonlinear matching pursuit method 



M 




(51) 



k=l 
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that we proposed in the sections, we assume that a& and bk are smoother than cos 9k and 
use these two coefficients to update 9k- In the case where the signal has strong intra- wave 
modulation, 9' k is as oscillatory as cos9k- Thus our current method would not be able to 
give a good approximation of 9 k unless we are given a very good initial guess for 9 k- To 
overcome this difficulty, we introduce a shape function, Sk, to replace cosine function. The 
idea is to absorb the high frequency intra- wave modulation into the shape function s&. This 
will ensure that 9' k is smoother than Sk{9k)- This idea has been proposed by Hau-tieng 
Wu in [29], but efficient algorithm to compute the shape function has not appear in the 
literature. 

Note that s& is not known a priori and is adapted to the signal. We need to learn s/% from 
the physical signal. This consideration naturally motivates us to modify the construction 
of the dictionary as follows: 

M. = {a^SkiPk) '■ a k,9'k £ V(9k), Sk is 2-7r-period function} , (52) 

where Sk is an unknown 27r-periodic 'shape function' and is adapted to the signal. If we 
choose Sk to be cosine function, then A4 = T>. 

We have developed an efficient method based on our I 1 regularized nonlinear matching 
pursuit to recover the components with intra-wave frequency modulation by looking for the 
sparsest decomposition using this new dictionary M.. By exploring the fact that Sk is a pe- 
riodic function of 9k, we can identify certain low rank structure of the signal. This structure 
enables us to extract the shape function from the signal. Once we get an approximation of 
the shape function Sk, we can use Sk to update 9k- This process continues until it converges. 
The detail of this method will appear in another paper. Here, we give several numerical 
examples to demonstrate the capability of our method. 

The first example is the solution of the Duffing equation. This is an important example 
to demonstrate the importance of the intra-wave frequency modulation. Our method can 
handle this signal even with noise. 

The Duffing equation is a nonlinear ODE which has the following form: 

^-4 + u + eu 1+w = 7 cos(/3t). (53) 
dt z 

The parameters, e,7,w, that we use here to generate the solution in Fig. [15l are the same 
as those in the paper |14j . e = —1, 7 = 0.1, j3 = ^ and oj = 2. The initial condition is 
u(0) = it'(0) = 1. 

In Fig. fT5"l we plot the shape function that we obtain from the solution of the Duffing 
equation. In this example, we can express Sk{9k) m terms of cos9k, from which we can 
recover the instantaneous frequency of the signal. More interestingly, from the Fourier 
coefficients of the shape function, we can recover the information regarding the nonlinearity 
of the Duffing equation, which is u) = 2 in this case. We refer to a recent work of Prof. 
Norden Huang for more discussions of how to extract the degree of nonlinearity using EMD 
(see Dr. Huang's lecture in the IMA Hot Topic Workshop on Trend and Instantaneous 
Frequency, September 7-9, 2011, IMA). 
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Figure 15: Left: the solution of duffing equation; Middle: the shape function s; Right: the 
Fourier coefficients of s. 
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Figure 16: Left: the solution of duffing equation with noise X(t); Middle: the shape 
function s; Right: the Fourier coefficients of s. 

We also add Gaussian noise X(t) with covariance a 2 = 1 to the original solution of the 
Duffing equation. Fig. [16] shows the corresponding results. We can see that the shape 
function extracted from the noisy signal still keeps the main characteristics of the shape 
function extracted from the signal without noise. We can also clearly extract the degree of 
nonlinearity, uj = 2, even with such large noise perturbation to the solution of the Duffing 
equation. 

6 Some preliminary error analysis for the data with scale 
separation 

In this section, we perform some preliminary error analysis for our nonlinear matching 
pursuit method. To guarantee uniqueness of the decomposition, we need to impose certain 
scale separation property for the data that we try to decompose. Before we state our result, 



31 



we first define what we mean by scale separation for a given signal. 



Definition 6.1 (Scale-separation) One function f(t) = a(t) cos 9{t) is said to satisfy a 
scale- separation property with a separation factor e > 0, if a(t) and 9{t) satisfy the following 
conditions: 



a(t) eC\R), 9 G C 2 QR), 

inf 9'{t) > 0, M = sup \9"{t)\ < oo 



a'(t) 



?'(t) 



r>(t) 



(my 



< e, Vt G R. 



Definition 6.2 (Well-separated signal) ^4 signal f : M — > M. is said to be well-separated 
with separation factor e and frequency ratio d > 1 if it can be written as 



K 



f{t) = Y j a k {t) cos9 k {t) 



k=l 



where all fk(t) = a&(i) cos 9k(t) satisfies the scale- separation property with separation factor 
e, and their phase function 9^ satisfies 



9' k (t)>d9' k _ l (t), VtG 



(54) 



Theorem 6.1 Let f(t) be a function satisfying the scale-separation property with separation 
factor e and frequency ratio d as defined in Definition \6.S[ Choose a low-pass filter 4> such 
that its Fourier Transform <fi has support in [—A, A] with A < J^j 2 an< ^ ( t ) Q t ) = 1, VA; G 
[— A/2, A/2]. If in the nth step, the approximate phase function 9 ko satisfies the following 
condition: 



A 

< 2' 



(55) 



'k J (*) 

then the accuracy of the phase function in the next step is order e, i.e. 

9 n + 1 (t)-9 k0 (t)\=O(e). 

In order to prove the above theorem, we need the following lemma: 
Lemma 6.1 For any a{t) G C^R), 9 G C 2 (R), we have 



(56) 



{T)e- ie{T) 4>{t - t)dr - a(t)e- ie{t) 4>(9'{t)) 
where I n = J \t n 4>(t)\dt . 



< sup \a'(t)\h + -\a(t)\ sup \9"{t)\I 2 , (57) 
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Proof The proof follows from the following direct calculations: 
a(T)e- ie{r) 4>(t - t)dr - a{t)e~ im ^ (0'(i)) 

(a(r)-a(t))e l9 ^ 4>(r-t)dr + a(t) j ( e -*M - e -*(*(*)-*'(*)(r-t))) ^ T - t )dr 
{air) - a(t)) e^ cp(r - t)dr + a(t) | (e-W-W-e'WfT-t)) _ ^ e -i(0(*)-0'(*)(r- t )) 0(r _ t){fa 

^-^'(•(r))(r-t)» _ ^ e -i(tf(t)+9'(t)(r-t))^ (r _ ^ r 



< sup \a' (t)\ j \r(j)(T)\dT + \a(t)\ 

< sup|o'(t)| / \T(j>(r)\dT+ \a(t)\ 



-9"(s(r))(r-tf^T-t) 



< sup|o'(t)| / \T(j){r)\dT + -\a{t)\ sup \6"{t)\ / |r 2 0(r)|dr 



f/r 



sup \a'{t)\h + ~\a{t)\ sup \0"(t)\I 2 



(58) 



□ 



Remark 6.1 We remark that since we typically deal with data of finite support and extend 
them periodically to the whole domain, the estimates for 1\ and I2 in the above lemma are 
effectively taken only in the finite support of the data. 

Corollary 6.1 If the Fourier Transform of the low-pass filter (j) is symmetric, i.e. 4>(k) = 
4>{—k), then we have 



) cos (0(r)) 4>(t - t)dr - a(t) cos 6(t)<i> (0'(t)) 
o(t) sin (0(r)) 0(t - t)dr - a{t) sin 6(t)<j> (0'(t)) 



< sup \a'(t)\h + -\a(t)\ sup \6"{t)\I 2 (59) 

< sup \a'{t)\h + -\a(t)\ sup |0"(£)|I 2 .(6O) 



Now we can prove Theorem 16.11 Here we only give a sketch of the proof, the detail of the 
proof is deferred to the Appendix. 

Proof of Theorem 16.11 In order to simplify the notation, we denote = 0£ Q and and 

define (•) as the mapping from t to 0, i.e. f(9) = f(t), V/. In our algorithm, we update 
0^ o 1 " 1 by the following step 



9? +1 = 8- arctan . 
k ° \a(t) 



bit) 



, a{t) = A(0(t)), b(t) = B(0(t)), 



where 



A( 7 ) = 2 j f{6) cos(0)0 (0 - 7) d9, B( 7 ) = 2 J f(9) sin(0)0 (0 - 7) 



dO. 



(61) 



(62) 
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We will prove that a(t) and b(t) satisfy the following estimates: 



a(t) 

m 



A (9(t)) = a ko (t) cos (0 ko (t) -9) + 0(e). 
B (9(t)) = a ko (t) sin (9 ko (t) -6) + 0(e). 



Then, we can get 



which implies that 



A9 = arctan = 9 ko (t) - 9 + 0(e), 



(63) 
(64) 



(65) 



(66) 



(9 - 7) d9 



First, let us estimate -A (7) as follows: 

A( 7 ) = 2 / cos (0)0 (0 - 7) d6 = 2 J2 f a k (0) cos fc (t) cos( 

J k=l •* 

n . n „ 

= Z) / ^ cos ( 0fc ^ +~e)4>(p-i)dO + Ys 3 *® cos - 0) - 7) ^ 

fc=l fc^fco 



fc=l 

+ 



5 fco (0)cos (^ (i)-5) 0(0-7) ^ 



I + 11 + III. 



(67) 



For I and II, the scale separation assumption of the data implies that a k (9) cos (6 k (t) + 6) 
and a k (9) cos (0fc(i) — 0) (k 7^ fco) are more oscillatory than the kernel 0(0). Thus we expect 
that these two terms are small after convolution with a smooth kernel. In fact, we can prove 
that I, II = 0(e) by Corollary [O 

The estimate for III is more 

This implies that afc o (0)cos (9 ko (t) — 9j is smoother than the kernel (f)(9). Using Corollary 
that III = a k() (9) cos (9 ko (t) — 9) + 0(e). By combining these results, we 



difficult. By our assumption, we have 



(*) 



e> (t) 



< 



2 ■ 



\6.1\ we can prove 
get 



a(t) = A (9(t)) = a ko (t) cos (9 ko (t) -9)+ 0(e). 
Similarly, we can prove the estimate for b(t) 

b(t) = B(9(t)) =a k0 (t) sin (9 ko (t) -9) + 0(e). 
This completes the proof. 



(69) 
□ 
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Figure 17: IMFs with different factors of scale separation. Left: poor scale separation; 
Right: good scale separation. 



Remark 6.2 Under the same assumption, we can prove that the error of the instantaneous 
frequency is also order e, i.e. 



fe 



{t)-e' k0 {t) 



0(e) 



(70) 



The argument is almost the same as the above proof, except that the calculation is a little 
more involved. 

From the above theorem, we can see that the accuracy of our method depends on the factor 
of scale separation. This is also consistent with our numerical results. In the following 
numerical example, we compare the IMFs obtained by our method for two different signals. 
One has poor scale separation, the other one has better scale separation. The signals 
are given by ([7T|) . The signal f 2 has a better scale separation property than f\ since its 
instantaneous frequency is twice of that of f\. As shown in Fig. [T71 the error we obtain for 
fi is considerably smaller than that of f\. 

1 



a (t) = ai(t) 



9 = 10sin(2vrt) +40vrt. 



1.1 + cos(27rf) ' 
= a o (i) + ai(*)cos0(t), f 2 (t) =a o (i) + ai(i)cos(20(t)). 



(71) 



We would like to point out that the error estimate given by Theorem 16. II is highly over- 
estimated. In Fig. [TTl we can see that even for the signal with a poor scale separation 
property, the instantaneous frequency we obtain is still reasonably accurate, although the 
corresponding scale separation factor e « 1.4 is quite big according to Definition 16. ip . 
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In the estimate of Lemma [6. 11 instead of taking the supreme over R, we can take supreme 
over a finite interval, since we can choose a low-pass filter <j> that decays exponentially fast. 
Then, we can get a more local estimate: 



a(r)e ie ^ 0(t - t)dr - a(t)e ie ^'(f (0'(t)) 



*(*); 



< sup |o'(t)|/i + \\a(t)\ sup |0"(i)|I 2 , (72) 



where S (f> = {t£R: \(j){t)\ > e}. 

This seems to suggest that for the signal that does not have a good scale separation 
property in some region, its influence on the accuracy of the decomposition is limited to 
that region. This is the temporal locality property we have mentioned in Section 5.1. This 
property can be also seen in Fig. [TTJ In this example, the scales are not well separated in 
the center of the interval. However, the scales are better separated near the two ends. We 
can see that the error near the boundary is much smaller than that in the center. 

From this analysis, we can also see that the low-pass filter with smooth Fourier spectrum 
(such as the cosine function given by (|31|)) would perform better than that with discon- 
tinuous spectrum (such as the stair function given by (|30|) ) in terms of maintaining the 
temporal locality property of the decomposition. The low-pass filter with discontinuous 
spectrum decays much slower in the time domain due to the Gibbs phenomena. This is 
why we use the cosine low-pass filter instead of the stair one. 

Theorem 16.11 tells us that if we have a good initial guess, then there is no need to do 
iterations. But in most cases, we have only a rough initial guess, and the condition in 
Theorem 16.11 may not be satisfied. In this case, the iterative procedure in our algorithm 
improves the result gradually as the number of iterations increases. It will provide a good 
approximation after a number of iterations. In Fig. \18\ we show how the iteration can 
improve the approximation of the instantaneous frequency for a simple chirp signal: f(t) = 
cos(10vr(3t + l) 2 ). 

In this example, the initial guess for the instantaneous frequency is a constant, 9q = 807ri. 
Near the intersection of 9' = 80ir and the exact instantaneous frequency, the initial guess is 
relatively good. After one step, we can get a better approximation in this local region. This 
new estimate gives us a better guess for the instantaneous frequency in a slightly larger 
interval that contains the good interval given by the initial guess. Then in the next step, 
we can get a good approximation in an even larger interval. Gradually, we can get an 
accurate approximation of the instantaneous frequency in the whole interval. Fig. [18] plots 
the approximate instantaneous frequency in different steps. As the number of iterations 
increases, the region in which we have a good approximation becomes larger and larger. 
Finally, the iterative algorithm produces an accurate instantaneous frequency in the entire 
domain. 
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Figure 18: Instantaneous frequency at different step. 

7 Conclusion 

In this paper, we introduce a new data-driven time-frequency analysis method based on the 
nonlinear matching pursuit. The adaptivity of our decomposition is obtained by looking for 
the sparsest representation of signals in the time-frequency domain from a largest possible 
dictionary that consists of all possible candidates for intrinsic mode functions (IMFs). Solv- 
ing this nonlinear optimization problem is in general very difficult. We propose a nonlinear 
matching pursuit method to solve this nonlinear optimization problem by generalizing the 
classical matching pursuit for the I optimization problem. One important advantage of 
this nonlinear matching pursuit method is it can be implemented very efficiently. Further, 
this approach is very stable to noise. For data with good scale separation property, our 
method gives an accurate decomposition up to the boundary. 

We have also carried out some preliminary theoretical study for the nonlinear optimiza- 
tion method proposed in this paper. In the case when the signal satisfies certain scale 
separation conditions, we show that our iterative algorithm converges to an approximate 
decomposition with the accuracy determined by the scale separation factor of the signal. 

There are some remaining issues to be studied in the future, such as data with poor 
scale separation property, data with intra-wave frequency modulation, the so-called 'end 
effect' of data, and data with incomplete or sparse samples and so on. We have addressed 
these issues to some extent in this paper, but much more work need to be done to resolve 
these challenging issues. 
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Another direction is to generalize this adaptive data analysis method to high dimensional 
data. In some physical applications such as propagation of nonlinear ocean waves, each 
wave form has a dominating propagation direction. In this case, our method has a natural 
generalization by adopting a multi-dimensional phase function. 



Appendix 



Proof of Theorem 16.11 In order to simplify the notation, we denote 9 = 9^ o and use (• 

to represent the mapping from t to 9, i.e. f(9) = f(t), V/. 



According to our algorithm, we update 9? +1 as follows: 



9? +1 = 9- arctan ( ^| 



, a(t) = A(9(t)), b(t) = B(9(t)), 



(73) 



where 

A(i) = 2 J cos(5)0 (9 - 7 ) dS, B( 7 ) = 2 y J(9) sin(0)0 (5 - 7 ) (74) 
We first estimate A( 7 ) as follows: 

A( 7 ) = 2 / 7(5) cos(0)<£ (5 - 7 ) d0 = 2 ^ / 5 fc (0) cos fc (t) cos(#)0 (5 - 7 ) S. 
For k 7^ /co, we have 

2 y S fc (5) cos 9 k (t) cos(#)0 (5 - 7 ) dO 

a k (9) (cos (9 k (t) + 9) + cos (6 k (t) - 9) ) (5 - 7 ) d0. (75) 



Since 



dafe(^) 



"*(*) 



< e 



'(0 



(76) 



we obtain 



^(9 k (t)±9) 
d9 



9u¥(t)-0' k (¥'(t) 



< e 



m)f - o' k (t)9' (t) 



forfe/fco- (77) 



Now we apply Corollary 16. II for k ^ ko to obtain 
2 /" a fc (5) cos 9 k (t) cos (0)0 (9 - </>) d9 

>*(*) 



a fc (0) cos (0 fc (t) + 5) 4^ + 1 + a fc (0) cos (0 fc (i) - 9) <f>\ Jfi-L - 1 + O(e). (78) 
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Using the condition A < f + ^ 2 , 0' k (t) > and 



o\t) 



< -g-, we get 



^-1 = MP_ 1>rf(1 _ A/2) _ 1>A , if , >fc0 , 

e\t) 0ko(t) e\t) 

7(f) 

$^ + 1 > 1>A. 

7(t) 



fl (!\ £/Y - K (1 + A/2)/d - 1< -A, ifk<k , 



Since the support of is within [—A, A], we have 

2 J a k (9) cos 9 k (t) cos(0)0 (5 -</>)$ = 0(e). 
For = fco, we proceed as follows 

2 y a k (6) cos fe (i) cos(#)0 (5 - <p) d~6 

a ko (S) (cos (e kQ (t) + e)+ cos (%,(t) - e)) 4> (e - </>) S. 

Similarly, by using the assumption 



da ko (9) 



(16 



we obtain the following estimates: 



d 



(t) 



< e 



9(t) 



j (MO*" 



d 



'*,(*) 



'(*) 



+ 1 



'(t) 



> 1 > A, 



A 

< 2' 



d 2 

-=2 (^feo(*) ± 
do 



9' k ' (t)9'(t)-e> ko (t)9"(t) 



'(*) 



< e 



By applying Corollary 16.11 again, we get 
2 / a fc (0) cos fc (i) cos (#)</> (9 - (f>) d9 



a ko (9) cos (9 ko (t) + 9)$( + 1 j + a k0 (9) cos (0 ko (t)-9)^( 



(t) 

a ko (9) cos (9 ko (t)-9)+0(e). 



39 



Finally, we get the following estimate for a(t), 

a(t) = A (9(t)) = a ko (t) cos (6 ko (t) -9)+ 0(e). (89) 



For b(t), we can obtain a similar estimate 



b(t) = B (0(t)) = a k0 (t) sin (6 k0 (t) -9)+ 0(e). 



(90) 



Thus, we have 



A9 = arctan 




= e k0 (t)-e + o(e), 



(91) 



which implies that 



C(*)"M*) =0(e). 



(92) 



This completes the proof. 



□ 
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